Fast Lempel-Ziv Decompression in Linear Space

نویسندگان

  • Philip Bille
  • Mikko Berggren Ettienne
  • Travis Gagie
  • Inge Li Gørtz
  • Nicola Prezza
چکیده

We consider the problem of decompressing the Lempel-Ziv 77 representation of a string S ∈ [σ] using a working space as close as possible to the size z of the input. The folklore solution for the problem runs in optimal O(n) time but requires random access to the whole decompressed text. A better solution is to convert LZ77 into a grammar of size O(z log(n/z)) and then stream S in optimal linear time. In this paper, we show that O(n) time and O(z) working space can be achieved for constant-size alphabets. On larger alphabets, we describe (i) a trade-off achieving O(n log σ) time and O(z log1−δ σ) space for any 0 ≤ δ ≤ 1, and (ii) a solution achieving optimal O(n) time and O(z log logn) space. Our solutions can, more generally, extract any specified subsequence of S with little overheads on top of the optimal running time and working space. As an immediate corollary, we show that our techniques yield improved results for pattern matching problems on LZ77-compressed text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Byte pair encoding : a text compression scheme that accelerates pattern matching

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...

متن کامل

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization algorithms, some of which require only 2n log n + O(log n) bits of working space to factorize a string of length n. These are the most space efficient linear time al...

متن کامل

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

This paper presents a decompression pipeline capable of accelerating out-of-core volume rendering of time-varying scalar data. Our pipeline is based on a twostage compression method that cooperatively uses the CPU and GPU (graphics processing unit) to transfer compressed data entirely from the storage device to the video memory. This method combines two different compression algorithms, namely ...

متن کامل

A Lempel-Ziv Compressed Structure for Document Listing

Document listing is the problem of preprocessing a set of sequences, called documents, so that later, given a short string called the pattern, we retrieve the documents where the pattern appears. While optimal-time and linear-space solutions exist, the current emphasis is in reducing the space requirements. Current document listing solutions build on compressed suffix arrays. This paper is the ...

متن کامل

Efficient Text Compression Using Special Character Replacement and Space Removal

In this paper, we have proposed a new concept of text compression/decompression algorithm using special character replacement technique. Moreover after the initial compression after replacement of special characters, we remove the spaces between the words in the intermediary compressed file in specific situations to get the final compressed text file. Experimental results show that the proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.10347  شماره 

صفحات  -

تاریخ انتشار 2018